System and method for electronic circuit simulation

ABSTRACT

A system and method transforms a model of electronic circuit to improve simulation speed and/or reduce emulation area. The model may include storage elements; one or more of these storage elements may be represented by dense memory, and the storage elements may be represented by references thereto.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and the benefit of, and incorporatesby reference herein in their entirety, U.S. Provisional PatentApplications Ser. No. 63/235,287 and Ser. No. 63/235,283, both filedAug. 20, 2021, in the name of Steven F. Hoover.

BACKGROUND

A design of an electronic circuit, such as a microprocessor,application-specific integrated circuit (ASIC), or any other suchcircuit, may be designed by first describing it using ahardware-description language (HDL), such as the Very-High-SpeedHardware-Description Language (VHDL). The HDL description may beverified for correct operation by using it to create a correspondingsimulation of the electronic circuit. The simulation may be tested byapplying patterns of signals to inputs of all or part of theprogrammable computer hardware and by observing its corresponding outputsignals.

BRIEF DESCRIPTION OF DRAWINGS

For a more complete understanding of the present disclosure, referenceis now made to the following description taken in conjunction with theaccompanying drawings.

FIGS. 1A and 1B illustrate an exemplary system for verifying operationof transformed HDL model of an electronic circuit using a simulatoraccording to embodiments of the present disclosure.

FIGS. 2A and 2B illustrate a method for verifying operation of atransformed HDL model according to embodiments of the presentdisclosure.

FIG. 3 illustrates components used for software-based simulationaccording to embodiments of the present disclosure.

FIGS. 4A-4C illustrate electronic circuit elements used insoftware-based emulation according to embodiments of the presentdisclosure.

FIGS. 5A-5E illustrate systems and methods for creating sequence ofstorage elements according to embodiments of the present disclosure.

FIGS. 6A-6G illustrate techniques for transforming circuit elementsaccording to embodiments of the present disclosure.

FIGS. 7A-7D illustrate systems and methods for reducing the arearequired for storage elements used for emulation according toembodiments of the present disclosure.

FIGS. 8A-8C illustrate systems and methods for absorbing storageelements into larger storage elements according to embodiments of thepresent disclosure.

FIGS. 9A-9B illustrate systems and methods for remapping transformedsignals according to embodiments of the present disclosure.

FIGS. 10, 11, and 12 illustrate computers and networks for emulating anHDL model of an electronic circuit according to embodiments of thepresent disclosure.

DETAILED DESCRIPTION

The present disclosure relates to software-based simulation of ahardware-description language (HDL) model of an electronic circuit, suchas a microprocessor, application-specific integrated circuit (ASIC), orany other such circuit. In particular, the present disclosure relates tousing one or more of the techniques described herein to (a) increase thespeed of the emulation of the HDL model, (b) reduce the amount ofcircuit elements (i.e., digital-logic circuitry) required to model thebehavior of the HDL model, and/or (c) reduce the area required toimplement the HDL model using the programmable computer hardware. Forexample, use of one or more of the techniques described herein maydetermine a transformed HDL model required to simulate the electroniccircuit, thereby increasing the speed and/or reducing the cost,complexity, size, required power, and/or build time of the emulationsystem. The resultant simulation system may be tested by applyingpatterns of signals to inputs of all or part of the model and byobserving some or all corresponding output signals.

In various embodiments of the present disclosure, one or more of thecircuit-element transformation techniques described herein (as describedin greater detail below with reference to, for example, FIGS. 6A-6G) maybe applied by a model-transformation system to the HDL description tocreate one or more sequences of storage elements, such as flip-flops. Asthe term is used herein, the sequence includes two or more storageelements, configured such that an output of a first storage elementconnects to an input of a second storage element, an output of thesecond storage element connects to an input of a third storage element,and so on. Data received by the input of the sequence of storageelements is thus the same as the output of the sequence of storageelements; the output is a time-delayed version of the input asdetermined by one or more control signals, such as a clock signalreceived by the flip-flops. The model-transformation system may createone or more of these sequences of storage elements such that they crossa boundary between a first group of circuit elements (referred to hereinas a partition) and a second group of circuit elements or partition. Thefirst partition determines the input to the sequence of storageelements; this partition may be referred to as the upstream partition.The second partition receives the output to the sequence of storageelements; this partition may be referred to as the downstream partition.As described in greater detail below (by, for example, FIGS. 5D and 5E),the downstream partition may access data stored in some or all of thestorage elements of the sequence of storage elements by, for example,storing the contents of the sequence of storage elements in a FIFO. ThisFIFO may allow the downstream partition to continue processing data toaccount for a latency between elements and/or even if the upstreampartition is temporarily unable to produce further output data (i.e., a“stall”) and thus allow the overall system to run the emulation morequickly.

Referring first to FIG. 1A, a model-transformation system 102 mayprocess an HDL description 120 (which describes an electronic circuit tobe tested in the HDL) to determine a transformed HDL description 122. Asdescribed in greater detail herein, the model-transformation system 102may process the HDL description 120 to determine that one or more groupsof circuit elements are able to be transformed (e.g., migrated and/orretimed) as described herein to thereby improve the speed and/or reducethe area of the hardware-based emulator 108. The transformed HDLdescription 122 may describe the electronic circuit in the same ordifferent computing language as used by the HDL description 120. Thetransformed HDL description 122 may include a portion of the HDLdescription 120 that has not been so transformed; the transformed HDLdescription 122 may omit other portions of the HDL description 120 thathave been transformed as well as a description of any transformedcircuit elements.

A model-synthesis system 104 may process the transformed HDL description122 to determine a synthesized HDL description 124. While the HDLdescription 120 may contain (for example) Boolean logic equations andsimilar statements, the synthesized HDL description 124 may containcorresponding representations of circuit elements (e.g., AND and ORgates that implement a Boolean logic equation). The model-synthesissystem 104 may perform basic logic optimization (e.g., combining ormerging circuit elements when possible) and/or simple retiming (e.g.,reconfiguring circuit elements to solve a critical-path or racecondition). Examples of model-synthesis systems include the STRATUS andGENUS systems provided by Cadence Design Systems of San Jose, Calif.,USA. One of skill in the art will understand that any model-synthesissystem is within the scope of the present disclosure.

A model-emulation system 106 may process the synthesized HDL description124 to program one or more programmable hardware components (e.g.,FPGAs) of a hardware-based emulator 108. The hardware-based emulator 108may further contain software component(s) for providing inputs to, andprocessing outputs from, the one or more programmable hardwarecomponents. The hardware-based emulator 108 is described in more detailwith reference to FIG. 1C.

The model-transformation system 102, the model-synthesis system 104, andthe model-emulation system 106 may communicate with each other and/or auser device 110 via a network 100; this network 100 may be a wired,wireless, or any other type of network. A user 112 may provide input 132to the user device 110 to initiate transformation of the HDL description120 by the model-transformation system 102, to initiate synthesis of thetransformed HDL description 122 by the model-synthesis system 104,and/or initiate programming and/or execution of the synthesized HDLmodel 124 by the model-emulation system. The user 112 may furtherreceive output 130 from the user device 110 indicating feedback from thesystems 102, 104, 106. The user 112 may further interact with themodel-emulation system 106 via the user-device input 132 and theuser-device output 130 to initialize emulation, execute emulation,and/or view the results of an emulation.

Any number of user devices 110 and users 112 may so communicate with thesystems 102, 104, 106. The systems 102, 104, 106 may be disposed on oneor more remote systems 1100, as shown in FIG. 11 . Input data 132 mayspecify the behavior of input signals, such at which times a particularinput signal is represented by a logical “0” or a logical “1.” Operationof the emulator 108 includes applying the specified input signals to oneor more elements of the emulator, determining the outputs of thoseelements, and similar processing of those outputs by additionalelements. The output data of the computer model may be displayed on adisplay of the user device 110 and/or stored in computer data, such as atext data file and/or trace data file. The emulator 108 may beinteractive, and may simulate only portions of the model derived fromthe synthesized HDL description 124 and/or only certain periods of timeof operation of the model per user (and/or other) control input.

Waveform viewers of the user device 110 may be used to represent modeloutput data, such as signal traces, as signal waveforms. Waveformviewers may be categorized as timeline views, where time is representedon one of the two axes of the two-dimensional display, and the state ofthe model is represented on the other axis. A state view may be used torepresent machine state at a point in time. State viewers may provideinteractive controls to allow a user to adjust which period of time isdisplayed. In synchronous logic, controlled by a clock, time may beexpressed discretely as clock cycles. Timeline views and state views arenot a mutually exclusive categorization. More generally, state views mayrepresent a window of time in the neighborhood of a reference time. Atimeline view with a reference time, such as a waveform view with acursor (vertical line at the reference time), can also be considered astate view.

The outputs of the emulator 108 may further be used to representemulation behavior as a state view by, for example, annotating displayedwires/arcs with values from the emulation. Unlike waveform viewers,these representations may require knowledge of the emulation, not justthe signal trace.

In some embodiments, the model-transformation system 102 receivesemulation-environment data 126 from the model-emulation system 106. Thisdata 126 may include indications of the number and types of programmablehardware components available in the hardware-based emulator 108, inaddition to latencies that exist therebetween. As explained in greaterdetail below, the emulation-environment data 126 may be used todetermine partition boundaries in the transformed HDL description 122.By determining partition boundaries that correspond to thehardware-emulator latencies, the overall cycle time of the emulator maybe increased.

FIG. 1B illustrates a number of software components of themodel-transformation system 102 relevant to the present disclosure. Someor all of the components may instead or in addition be disposed on theuser device 110. Other components of the user device 110 andmodel-transformation system 102 are illustrated in FIGS. 10 and 11 ; oneof skill in the art will understand that the user device 110 andmodel-transformation system 102 may contain still other components.

A model-partitioning component 142 may parse the HDL description 120 todetermine two or more partitions, each with corresponding partitionboundaries and associated groups of circuit elements. As described ingreater detail below with respect to FIG. 3B, the model-partitioningcomponent 142 may select partitions and boundaries such that a sequenceof storage elements (e.g., sequence of flip-flops) may span thepartition boundary and allow a partition downstream of the sequence ofstorage elements to use the data stored in the flip-flops (and/or a FIFOderived therefrom). This access to the sequence may enable thedownstream partition to continue to process data even if an inter-FPGAlatency is associated with the partition boundary and/or if thepartition upstream of the sequence of storage elements stalls to due,e.g., an unmet data dependency or other such dependency (collectivelyreferred to as “backpressure”). In some embodiments, themodel-partitioning component 142 determines partition boundaries tocorrespond to inter-FPGA latencies as defined by the emulationenvironment data 126. Further details of partition boundaries andselection thereof are discussed in further detail with respect to FIG.3B.

A sequence-of-storage-elements determination component 144 may parse theHDL description 120 to identify groups of circuit elements thatcorrespond to one or more circuit-element transformations (candidatetransformations are explained in greater detail with respect to FIGS.6A-6G). The sequence-of-storage-elements determination component 144 maythen select and perform one or more transformations to create one ormore sequences of storage elements (e.g., sequences of flip-flops) thatspan or abut one or more partition boundaries. Further details of thesequence-of-storage-elements determination component 144 are describedin greater detail below with respect to FIGS. 5A-5E.

A circuit-element transformation component 146 may apply one or moretransformations as directed by the sequence-of-storage-elementsdetermination component 144 and/or the storage-elements size-reductioncomponent 148. Further details of the various transformations aredescribed in greater detail below with respect to FIGS. 6A-6G.

The storage-element size-reduction component 148 may process the HDLdescription 120 to identify structures of storage elements, such asFIFOs and/or queues, and absorb (e.g., merge) stand-alone storageelements, such as flip-flops, into the FIFOs and/or queues. Thestorage-element size-reduction component 148 may further preform one ormore transforms such that additional storage elements are electricallyconnected adjacent to the FIFOs and/or queues. Further details of theoperation of the storage-element size-reduction component 148 aredescribed in greater detail below with respect to FIGS. 7A-7C.

A storage-element absorption component 150 may perform the absorption ofthe stand-alone storage elements into the FIFOs and/or queues. Furtherdetails of the storage-element absorption component 150 are described ingreater detail below with respect to FIGS. 8A-8C.

A signal-remapping component 152 may be used to map signals created bythe model-transformation system 102 in the transformed HDL description122 to signals in the original HDL description 120. The user 112 may,for example, provide input 132 to the user device 110 requesting a valueof a signal in the HDL description 120 at a particular time. If thatsignal was transformed to use different circuit elements, thesignal-remapping component 152 may recursively apply correspondingreverse transforms to derive the value of the signal of the HDLdescription 120 from the values of signals of the transformed HDLdescription 122. Further details of the signal-remapping component 152are described in greater detail below with respect to FIGS. 9A-9B.

FIGS. 2A-2B illustrate methods of using circuit-element transformationsin accordance with embodiments of the present disclosure. In variousembodiments, the system receives first data representing ahardware-description language (HDL) description of an electroniccircuit. In various embodiments, the system may identify, using thefirst data, a first description of a first storage element configured toprocess, during a first time period, an input signal to store dataassociated with the input signal and to output, during a second timeperiod, an output signal representing the data. The system may determinea computer-memory structure configured to store the data and determine asecond description of a second storage element configured to store areference value indicating a location of the data in the computer-memorystructure, a topology of the second storage element corresponding to atopology of the first storage element. The system may determine, usingthe first data and the second description, a software-simulation modelcorresponding to the HDL description and the second storage element. Thesystem may store, using a first portion of the software-simulation modelcorresponding to the second storage element, the reference value. Thesystem may process, by the computer-memory structure during the secondtime period, the reference value to output the data.

FIG. 3 illustrates components used for simulation according toembodiments of the present disclosure. The HDL model 302 (created byprocessing the HDL description 120) may describe a hierarchical systemin which the top level of hierarchy is a number N of processing units,each including the same or different circuit elements. The processingunits may be computer processors 308 as described in the HDL description120; the present disclosure is not limited, however, to only processorsas the top-level hierarchical processing units, and any such units, suchas processor cores, memories, caches, graphics processors,digital-signal processors, etc., are within the scope of the presentdisclosure. The processing units may be all the same (e.g., a 4×4 arrayof identical processors) or different (e.g., a first low-power processorand a second high-performance processor). In the target computing system(as distinguished from the emulator system) to be fabricated using theprocessing units, significant latencies (e.g., 1-10 microseconds) may beincurred when a signal crosses a boundary between a first processingunit and a second processing unit. This delay may be much greater than adelay between a first circuit element and a second circuit element thatexist on the same processing unit.

The transformed model 304 (created by processing the transformed HDLdescription 122), as described in greater detail herein, may be dividedinto two or more partitions 310 (by, for example, the model-partitioningcomponent 142). The number of partitions may be M partitions and, insome embodiments, the number of partitions equals the number ofprocessing units in the HDL model 302 and each partition corresponds toa processing unit (e.g., N=M). In other embodiments, as shown in FIG.3B, a single processing unit may include two or more partitions. Eachpartition may contain a number of circuit element FIGS. 4A-4C illustrateelectronic circuit elements used in hardware-based emulation accordingto embodiments of the present disclosure. As the terms are used herein,“circuit element” refers to any transistor, logic gate, flip-flop, FIFOstructure, queue structure, memory, fan-out connection, fan-inconnection, etc., that may be referenced in the HDL description 120.Circuit elements may be divided into two categories: data-alteringelements and data-preserving elements. As those terms are used herein, a“data-altering element” include elements that perform an operation on orotherwise alter data inputs received by the element. Data-alteringelements may include, for example, AND gates, OR gates, XOR gates,inverters, or any other such element that performs a Boolean or othersuch function on its inputs to produce corresponding outputs. A“data-preserving element” includes elements that do not alter datainputs and output a (perhaps time-delayed) signal that corresponds tothe input signal. Data-preserving elements may include, for example,flip-flops, multiplexers, fan-in connections, fan-out connections, orother such structures.

Referring to FIG. 4A, a first data-altering element A 402 may include,for example, a first element 1 430 (e.g., an AND gate), a second element2 432 (e.g., an inverter), and a third element 3 434 (e.g., an OR gate).The first data-altering element A 402 may receive inputs 410, 412, 414and determine an output 416 that may differ from any of the inputs 410,412, 414.

A second data-altering element A 404 may include, for example, a firstelement 1 440 (e.g., an AND gate), a second element 2 442 (e.g., aninverter), and a third element 3 444 (e.g., an OR gate). The seconddata-altering element A 404 may receive inputs 420, 422 and fixed input424 and determine an output 416 and a fixed output 428. The presentdisclosure is not limited to data-altering elements with only theillustrated circuit elements; data-altering elements containing anynumber or type of similar circuit elements, and having any number ofinputs or outputs, are within its scope.

The data-altering elements B 404 include a fixed input 424 and a fixedoutput 428. As the term is used herein, a “fixed” input or output is asignal that may not be transformed, migrated, or otherwise retimed usingthe techniques described herein. Examples of fixed inputs or outputsinclude processor (or other discrete element) inputs or outputs, inputsor outputs received or sent to non-programmable hardware or unchangeablesoftware, or other such inputs or outputs. As explained below withreference to FIG. 6G, a data-altering element receiving or generatingfixed inputs or outputs may be transformed into a first subset ofcircuit elements that receive or generate the fixed inputs and outputsand a second subset of circuit elements that do not receive or generatethe fixed inputs or outputs.

Referring to FIG. 4B, data-preserving elements 450 may include one ormore circuit elements that do not alter the data received as inputs.Such elements may include a first element 1 462 (e.g., a multiplexer), asecond element 2 464 (e.g., a flip-flop), and/or a third element 3 466(e.g., a fan-out connection). Other such elements, such as a fan-inconnection, FIFO, queue, etc., are within the scope of the presentdisclosure. As explained in greater detail herein, a group of circuitelements containing data-altering elements and data-preserving elementsmay be transformed such that a data-preserving element originallydownstream of the data-altering element is retimed to be upstream of thedata-preserving element (and vice versa).

FIG. 4C defines transaction elements 470 a . . . 470N as groups ofcircuit elements that comprise one or more data-preserving elements 450and one or more data-altering amendments 402 (with appropriateconnections therebetween). Each transaction element 470 a . . . 470Nreceives, as input, either a fixed input 472 or the output of anothertransaction element 470 a . . . 470N. Each transaction element 470 a . .. 470N further provides, as output, either an input to anothertransaction element 470 a . . . 470N or a fixed output 474. Eachtransaction element 470 a . . . 470N may not generate or receive furtherfixed outputs or inputs. The transformation techniques described hereinmay thus be applied to some or all of the ransaction element 470 a . . .470N. In some embodiments, the data represented by the fixed inputs 472may be referred to as a “transaction”; the data may correspond to agroup of related signals that flow through the transaction elements 470a . . . 470N. Example transactions include, for example, instructions,packets, flits, etc.

FIGS. 5A-5E illustrate systems and methods for creating sequence ofstorage elements by the sequence-of-storage-elements determinationcomponent 144 according to embodiments of the present disclosure.Referring first to FIG. 5A, a partition 500 may include groups ofcircuit elements that include a first group of data-preserving anddata-altering elements A 501 and a second group of data-preserving anddata-altering elements A 503. These elements 501, 503 may include anytype of circuit element, including flip-flops, gates, multiplexers,FIFOs, or other such elements. The elements 501, 503 may be separated byone or more storage elements 502, 504, which may be flip-flops. Thestorage elements 502, 504 may be pipeline or “staging” flip-flops andmay be present in the HDL description 120 to divide the elements 501,503 such that they meet cycle-time requirements. Thesequence-of-storage-elements determination component 144 may parse theHDL description 120 to determine that the elements 501, 502, 503, 504are candidates for creation of a sequence of storage elements based on(for example) their composition and connections.

In FIG. 5B, the sequence-of-storage-elements determination component 144may apply via the circuit-element transformation component 146 one ormore of the transformations described herein to modify the elements 501,503. As illustrated, the element 501 is transformed to retime at leastone storage element A′ 506 from the original data-preserving anddata-altering elements A 501 such that its output is the input to thestorage element B 502, leaving behind a modified data-preserving anddata-altering elements A′ 505 (that does not include the storage elementA′ 506). The sequence-of-storage-elements determination component 144may similarly transform the data-preserving and data-altering elements C503 to determine data-preserving and data-altering elements C′ 507 and astorage element C′ 508.

Although only two such transformations are shown in FIGS. 5A and 5B, thesequence-of-storage-elements determination component 144 may applysimilar transformations to any number of elements. The presentdisclosure is similarly not limited to transformations in only onepartition; any number of partitions may include circuit elements thatmay be similarly transformed.

Referring to FIG. 5C, the sequence-of-storage-elements determinationcomponent 144 may then transform the elements of the partition 500,using one or more of the techniques described herein, such that thedata-preserving and data-altering elements 505, 507 are disposedtogether on the upstream end of the path and such that the storageelements 506, 502, 508, 504 are disposed together on the downstream endof the path.

Referring to FIG. 5D, a downstream partition 510 may receive thepartition output 511 of the upstream partition 500. In accordance withembodiments of the present disclosure however, because the storageelements 502, 504, 506, 508 are positioned in accordance with thedisclosure of FIGS. 5A-5C, and because the storage elements 502, 504,506, 508 are data-preserving elements, the storage elements 502, 506,508 store values that will be output by the partition output 511 infuture clock cycles. The outputs of the storage elements 502, 504, 506,508 may be stored in a FIFO 515 in the downstream partition 510 (or inany other such storage element(s)).

FIG. 5D illustrates a sequence of storage elements including fourstorage elements 506, 502, 508, 504. The present disclosure is not,however, limited to only this number of storage elements, and any numbermay be disposed in a sequence of storage elements. The more storageelements, the greater the inter-FPGA latency may be compensated. Thestorage elements 502, 504, 506, 508 may be modeled by a FIFO, which maythen provide the values of the storage elements 502, 504, 506, 508 tothe downstream partition 510.

As described above, the sequence of storage elements formed by thestorage elements 502, 504, 506, 508 may be determined to be disposed, bythe sequence-of-storage-elements determination component 144, tocorrespond to a connection between programmable hardware components. Thedisposition of the sequence of storage elements may thus wholly orpartially compensate for the limit to clock cycle time imposed by theconnection. For example, if the latency between two FPGAs is 100 clockcycles, the sequence of storage elements may effectively “pipeline” thatlatency such that a higher clock frequency is possible.

FIG. 5E illustrates that the storage elements 502, 504, 506, 508 may bereplaced by a FIFO 515. The FIFO 515 may be modeled in hardware orsoftware. The downstream partition 510 may request values from the FIFO515 rather than from the partition 500 to compensate for inter-FPGAlatency and/or to compensate for a stall of the upstream partition 500.

FIGS. 6A-6G illustrate techniques for transforming circuit elementsusing the circuit-element transformation component 146 according toembodiments of the present disclosure. In these figures, thecircuit-element transformation component 146 may parse the HDLdescription 120 to determine which, if any, circuit elements match thetopology of any of the transformations described herein and maythereafter apply the one or more transformations. The transformationsare reversible in the sense that they may be applied in either direction(as indicated by the double arrows in the figures). The circuit-elementtransformation component 146 may be directed to apply a transformationby, for example, the sequence-of-storage-elements determinationcomponent 144 and/or the storage-elements size-reduction component 148.

Referring first to FIG. 6A, a first group of data-altering elements 602(e.g., combinational logic) may be disposed upstream of a group ofdata-preserving elements 604 (e.g., one or more flip-flops). Theelements 602, 604 may be migrated as shown such that the data-preservingelements 606 are upstream of the data-altering elements 608 (and viceversa). The two groups of data-altering elements 602, 608 may beequivalent (though they may be represented using different object namesin the transformed HDL description 122). The two groups ofdata-preserving elements 604 may be the same or different in the numberof storage elements contained therein; if the data-altering elements 602differ in their numbers of inputs compared to their numbers of outputs,the size of the data-preserving elements 604 may change accordingly toprocess the different numbers of inputs and outputs.

FIG. 6B illustrates a transformation of a group of circuit elementsincluding a multiplexer 610, data-processing elements 612, anddata-altering elements 616. Similar to FIG. 6A, the data-alteringelements 616 may be transformed into data-altering elements 618 thatappear at the input of a multiplexer 620 rather than at the output ofthe data-altering elements 616. Data-preserving elements 622 maysimilarly change in size with respect to the data-preserving elements612 to account for differences in the number of inputs versus outputs ofthe data-altering elements 618.

FIG. 6C illustrates a similar transformation involving a group ofcircuit elements that includes an array 624 having array entries 628 a .. . 628 n (which may be considered data-preserving elements).Data-altering elements 626 downstream of the array may be similarlytransformed to be data-altering elements 634 disposed upstream of anarray 636 having array entries 638 a . . . 638 n. The reversetransformation is also within the scope of the present disclosure.

FIG. 6D illustrates a similar transformation involving a group ofcircuit elements that includes a ring 640 having ring elements 642 a . .. 642 n (which may be considered data-preserving elements).Data-altering elements 644 providing inputs to each ring element 642 maybe similarly transformed to be data-altering elements 650 processingoutputs of a ring 646 having ring entries 448 a . . . 648 n. The reversetransformation is also within the scope of the present disclosure.

FIG. 6E illustrates a transformation of a number of identical circuitelement instances 652 n (each containing identical circuit elements)into a group of circuit elements that re-uses a single instance ofdata-altering elements 656 via time-domain multiplexing. Each instance652 also includes data-preserving elements 654. The instances 652 may betransformed as shown such that N instances of the data-preservingelements 658 a . . . 658 n provide an input to the single instance ofthe data-altering elements 662. The time-domain multiplexing of thedata-altering elements may be performed by using a multiplexer 660 toselect one of N inputs corresponding to each time domain and by asampler 664 to select an output of the data-altering elements 662corresponding to the time domain. Each of the data-preserving elements658 thus also store a value corresponding to each time domain.

FIG. 6F illustrates a transformation of a number of circuit elementsincluding data-altering elements 668, 670 separated by storage elements666 a, 666 b, 666 c. Note that the storage elements may be genericizedto be data-preserving elements and are depicted as flip-flops merely forclarity. The output of the data-altering elements 670 and of the storageelement 666 c feed back to be inputs to the data-altering elements 668.Though the figure depicts two groups of data-altering elements 668, 670and two feedback paths, the figure may be generalized to include anynumber of groups of data-altering elements, storage elements, andfeedback paths.

The transformation includes removing the storage element 666 b anddirectly connecting data-altering elements 674 to data-altering elements676. To compensate for this removal, an additional storage element 672 ais added to the input to the data-altering elements 674 and additionalstorage elements 672 b, 672 c are added to the feedback outputs of thedata altering elements 676 and the storage element 666 c. If the circuitelements include additional groups of data-altering elements,corresponding additional storage elements may be added in the samelocations.

FIG. 6G illustrates a transformation involving a group of data-alteringelements 608 that receives inputs 682 via a storage element 694 a anddetermines outputs 686 via a storage element 694 b. The data-alteringelements 680 also receive one or more fixed inputs 684 and/or determineone or more fixed outputs 688. As explained herein, due to the fixedinputs 684 and/or fixed outputs 690, the data-altering elements 680 maynot be transformed using the storage elements 694 a, 694 b.

The circuit-element transformation component 146 may thus transform thedata-altering elements 680 into two (potentially non-overlapping)subsets: a first subset of data-altering elements 680 a required todetermine the fixed output 688 using the inputs 682 and fixed inputs 684and a second subset of data-altering elements 680 b that does notreceive the fixed input 684 nor determine the fixed output 688 yetdetermines the output 686. The storage element 696 b, data-alteringelements 680 b, and/or storage element 696 c may thus be retimed usingone or more of the techniques described herein.

FIGS. 7A-7D illustrate systems and methods for reducing the arearequired for storage elements used for emulation, using thestorage-elements size reduction component 148, according to embodimentsof the present disclosure. Referring first to FIG. 7A, thestorage-elements size reduction component 148 may first process the HDLdescription 120 to identify a number of circuit elements that arecandidates for storage-element area reduction. For example, as shown inFIG. 7A, the circuit elements may include data-preserving elements 702,706 and data-altering elements 704, 708 arranged as shown. Any number ofdata-preserving and -altering elements and any arrangement thereof iswithin the scope of the present disclosure, however. In someembodiments, the elements do not receive fixed inputs nor determinefixed outputs (e.g., they process transactions as described in, forexample, FIG. 4C).

As shown in FIG. 7B, one or more of the data-altering elements 704, 708may be transformed by retiming it by moving it upstream (e.g., to theleft of the figure) across one or more data-preserving elements 702, 706in accordance with the techniques described herein. In some embodiments,the first group of data-altering elements 704 may be wholly or partiallymerged with the second group of data-altering elements 708; this mergingmay result in an overall reduction of circuit elements and/or the arearequired to program the circuit elements into the programmable hardwareelement(s). Though FIG. 7B illustrates one such transformation,embodiments of the present disclosure include similarly transforming anynumber of data-altering elements upstream of any number ofdata-preserving elements.

FIG. 7C illustrates that the (potentially) merged and/or transformeddata-preserving elements may then be transformed such that they aremoved downstream across one or more data-preserving elements. Asillustrated, the (potentially) merged data-altering elements 704 anddata-altering elements 708 are transformed such that they are moveddownstream of the data-preserving elements 706. Such transformations mayreduce the size required to store data in the data-preserving elements702, 706 as shown in FIG. 7D. Although the upstreamtransformations/merging illustrated in FIG. 7B and the downstreamtransformations of FIG. 7C are illustrated as two steps, one of skill inthe art will understand that, in other embodiments, these two steps maybe preformed as a single step. Any method of such transformation andmerging is within the scope of the present disclosure.

Referring to FIG. 7D, the data-preserving elements 702, 706 may bereplaced by data-reference storage elements 712 and dense data storageelements 714 to thereby reduce the overall circuit area required tostore the input data 722. Storing data in the data-preserving elements702, 706 may be expensive in FPGA area; if, for example, the input data722 is 1024 bits wide, and if the data-preserving elements 702, 706correspond to 10 cycles of latency, the total number of flip-flopsrequired to implement the data-preserving elements 702, 706 may be1024×10=10,240 flip-flops; this number of flip-flops may be impossibleto allocate on a single FPGA and/or may consume an unacceptable numberof FPGA resources.

Embodiments of the present invention thus store only a reference valuein the flip-flops (or other such less-dense storage elements); thisreference indicates an entry in dense data storage elements 714, whichhold the actual data values. The dense data storage elements 714 mayinclude dense storage such as a computer-memory structure (e.g., anarray, FIFO, and/or queue implemented in computer memory), dedicatedarray hardware on the FPGA, or other such dense memory. “Dense” storageis defined as any storage that is capable of storing a number of datasignals in such a way that the dense storage consumes less FPGA areaand/or resources than less-dense storage (e.g., flip-flops) configuredto store the same number of data signals.

In various embodiments, a data-reference determination component 716 mayreceive input data 722 and may generate reference data corresponding toan entry in the dense data-storage elements 714 holding the data 722.The size of the reference data (e.g., the number of bits required forthe reference value) may correspond to the number of entries required inthe dense data-storage elements 714. The number of entries may depend onthe number of cycles of latency of the data-preserving elements 702,706.

The dense data-storage elements 714 may store the input data in an entrycorresponding to the determined reference value. The data-referencestorage elements 712 may then process the determined reference value by,for example, storing it in a sequence of flip-flops corresponding to theoriginal latency of the data-preserving elements 702, 706. Thedata-reference storage element 712 may thus have the same topology asthe original data-preserving elements 702, 706. The topology of acircuit refers to the particular types of circuit elements therein andtheir particular connections therebetween. For example, if thedata-preserving elements 702, 706 include one or more multiplexers,branches, joins, queues, FIFOs, etc. connected a particular way, thedata-reference storage element 712 may have corresponding elementsconnected the same way. The data-reference storage element 712 maydiffer from the data-preserving elements 702, 706 only in the number ofstorage elements contained therein; the data-reference storage element712 may have a number of flip-flops, for example, much less than thenumber of flip-flops contained in the (at least because thedata-reference storage element 712 stores only references to the data,while the data-preserving elements 702, 706 store the actual data).

When the determined reference data 718 is output by the data-referencestorage elements 712, it may be used to indicate the corresponding datavalue as stored in the dense data-storage elements 714. The densedata-storage elements 714 may then output the input data 722 as delayedinput data 724 (e.g., delayed in accordance with the original latency),which may then be processed by the data-altering elements 704, 708.

Continuing the above example, if the input data is 1024 bits wide and ifthe latency of the data-preserving elements 702, 706 is ten cycles, thereference data may be determined to be 10 bits in size to address atleast 10 entries in the dense data-storage elements 714. The number offlip-flops required for the data-reference storage elements 712 may thusbe 10×10=100 flip-flops (as compared to the 10,240 flip-flops requiredto implement the data-preserving elements 702, 706), resulting in alarge savings in FPGA area/resources.

Furthermore, while each entry in the dense data-storage elements 714 maybe at least 1024 bits wide to store each item of input data 722, thedense data-storage elements 714 need not allocate additional entries toaccount for the latency of the data-preserving elements 702, 706; thedense data-storage elements 714 may store each item of input data 722only once and then output corresponding delayed input data 724 when thereference data 718 so indicates. In other words, while thedata-preserving elements 702, 706 may include a large number offlip-flops to store each of the 1024 bits of the input data 722 in eachstage of a pipeline as it propagates through a pipeline, the densedata-storage elements may store the 1024 bits only once and model thedelay of the pipeline by outputting the input data 722 only whenindicated by the reference data 718.

Although FIG. 7D illustrates only one replacement of less-dense storagewith more-dense storage (e.g., the dense data-storage elements 714), thepresent disclosure is not limited to only one such replacement, and anynumber of similar replacements is within its scope.

FIGS. 8A-8C illustrate systems and methods for absorbing storageelements into larger storage elements, using the storage-elementsabsorption component 150 according to embodiments of the presentdisclosure. Referring first to FIG. 8A, such a larger storage elementmay be a FIFO or queue 800; this large storage may be implemented whollyor partially in hardware on an FPGA, meaning that storing a value in theFIFO/queue may be more area-efficient than storing a value in a storageelement 812 a, 812 b (such as a flip-flop). The FIFO/queue 800 mayinclude used entries 802 (e.g., the entries to implement the FIFO orqueue) as well as unused entries 804. For example, the FIFO or queue maybe implemented in hardware to store 64 entries but the particular FIFOor queue may require only 32 entries. The unused entries may thusrepresent unallocated or wasted resources.

As the terms are used herein, a FIFO is a data structure than stores afirst value in accordance with a control input 808 and FIFO/queuecontrol logic 814 (e.g., a first-in value or a most-recently storedvalue) and outputs a second value in accordance with a control input 808(e.g., a last-in or least-recently stored value). A queue is a datastructure that maintains a first pointer to a “head” of the queue and asecond pointer to the “tail” of the queue; when directed by the controlsignal 808, the queue returns a value at the head of the queue (andupdates the first pointer accordingly) and stores a new value to thetail of the queue (and updates the second pointer accordingly).

FIG. 8A further illustrates that storage elements 812 a, 812 b may betransformed by moving either or both of them upstream or downstream ofthe FIFO/queue 800. As illustrated, the storage elements 812 a, 812 bare transformed to be downstream of the FIFO/queue 800; in otherembodiments, any number of storage elements may be transformed inaccordance with the techniques described herein.

Referring to FIG. 8B, the storage elements 812 a, 812 b may be absorbedinto the FIFO 820 by allocating some or all of the unused entries 804;the used entries 822 may thus represent entries corresponding to theFIFO or queue as well as entries corresponding to the absorbed storageelements 812 a, 812 b. Additional storage elements 824 a, 824 b may bedisposed on the control inputs 808 to account for the absorption of thestorage elements 812 a, 812 b. Because the size (e.g., width) of thedata inputs 806 is typically much larger than the size (e.g., width) ofthe control inputs 808, fewer storage elements (e.g., flip-flops) arerequired to implement the storage elements 824 a, 824 b as compared tothe number of storage elements required to implement the storageelements 812 a, 812 b.

In various embodiments, FIFO control logic 826 may process the controlinputs 808 to implement the first-in-first-out behavior of the FIFOdescribed above. Because the absorbed storage elements 812 a, 812 b arenot part of the FIFO, separate storage-element control logic 828 may beused to replicate the behavior of the storage elements 812 a, 812 b(e.g., capture the input data 806 and provide the output data 810).

The absorbing of the storage elements 812 a, 812 b may not consume allof the unused entries 804 in the FIFO 820; other storage elements may befurther absorbed into the FIFO 820. Further, as illustrated, the FIFO820 absorbs two connected storage elements 812 a, 812 b but, asdescribed herein, transformations may be applied to create largersequences of storage elements, which may also be absorbed into the FIFO820.

FIG. 8C is similar to FIG. 8B but illustrates absorption of the storageelements 812 a, 812 b into a queue 830 having used entries 832. Queuecontrol logic 834 may be used to implement the head/tail output/inputfunctions described above, and storage element control logic 836 may beused to control the entries in the queue corresponding to the storageelements 812 a, 812 b.

FIGS. 9A-9B illustrate systems and methods for remapping transformedsignals, using the signal-remapping component 152, according toembodiments of the present disclosure. Referring first to FIG. 9A, anumber of transformations 910, 912, 914 may be applied to the HDLdescription 120 as described therein; these transformations may alterthe circuit elements of the HDL description 120 such that originalsignals 922, 924, 926 of the HDL description 120 no longer appear in thetransformed HDL description 122; instead, a number of transformedsignals 928, 930, 932 may wholly or partially replace the originalsignals 922, 924, 926. A user 112 may, however, wish to examine valuesof the original signals 922, 924, 926 on a display 1018 of the userdevice 110; the values of the transformed signals 928, 930, 932 may beof no interest to the user 112. The following figures thus describe asystem and method for remapping values of the transformed signals toderive the original signals.

Some original signals 920 of the HDL description 120 may not betransformed; these signals thus remain in the transformed HDLdescription 122 (represented in the figure as the model that resultsafter N steps of transformation) and remapping of these signals is notnecessary.

In some embodiments, original signals 922 are transformed, by a singletransform 1 910 to produce transformed signals 1 928. In otherembodiments, multiple transforms may affect the same original signals.For example, original signals 926 may be affected by a transform 2 912;some of the resultant transformed signals 932 may appear in thetransformed HDL description 122, while other of the resultanttransformed signals 932 may be further transformed by a transform 3 914.The transform 3 914 may further transform additional original signals924 to produce transformed signals 2 932. Other sequences of transformsthat may re-transform already transformed signals and/or transformfurther original signals are within the scope of the present disclosure.

In various embodiments, as the transformations are applied, datarepresenting the transforms and the input and output signals affectedthereby are stored in a computer memory or similar structure. Forexample, at the transformed model step 1 902, data may be storedcorresponding to the transform 2 912, the original signals 922, and thetransformed signals 932. At the transformed model step 2 904, data maybe stored corresponding to the transform 1 910, the original signals926, and the transformed signals 1 928. Similarly, at the transformedmodel step 3 906, data may be stored corresponding to the transform 3914, the original signals 924, the subset of the transformed signals 932processed by the transform 3 914, and the transformed signals 4 932.

Given this storage of the transforms 910, 912, 914 and the originalsignals processed thereby, if and when the user 112 inputs an indicationto the user device 110 to display a value of an original signal affectedby one or more transforms, the signal-remapping component 152 may usethis stored transform data to determine the transformed signalscorresponding to the original signal and process the transformed signalsby recursively applying reverse transformations corresponding to theperformed transformations to derive a value of the original signal. Forexample, to determine a value of the original signals 4 926, thesignal-remapping component 152 may determine, using the stored data,that the corresponding transformed signals are the transformed signals 2930 (as transformed by the transformation 2 912 and the transformation 3914. The signal-remapping component 152 may then process the transformedsignals 2 930 by first applying a first reverse transformationcorresponding to the transform 3 914 and then applying a second reversetransformation corresponding to the transform 2 912.

The present disclosure is not limited to only these types or numbers oftransforms and to the reverse-mapping of these signals; one of skill inthe art will understand than any number of transformation steps and anycombination of reverse transforms are within its scope.

Referring to FIG. 9B, in some embodiments, the signal-remappingcomponent 152 may not store data corresponding to each step of thetransformation process, as described above, but may create a condensedrepresentation of the transforms that directly maps signals of thetransformed HDL model 122 (e.g., step N 908 of the transformationprocess). The signal-remapping component 152 may then recursively applyreverse transforms to the relevant signals in accordance with thecondensed representation.

FIG. 10 is a block diagram illustrating a user device 110. FIG. 11 is ablock diagram illustrating example components of a remote system 1100,which may be one or more servers and may include themodel-transformation system 102, the model-synthesis system 104, and/orthe model-emulation system 106. The term “system” as used herein mayrefer to a traditional system as understood in a system/client computingstructure but may also refer to a number of different computingcomponents that may assist with the operations discussed herein. Forexample, a server may include one or more physical computing components(such as a rack system) that are connected to other devices/componentseither physically or over a network and is capable of performingcomputing operations. A server may also include one or more virtualmachines that emulates a computer system and is run on one or acrossmultiple devices. A server may also include other combinations ofhardware, software, firmware, or the like to perform operationsdiscussed herein. The server may be configured to operate using one ormore of a client-system model or other computing techniques.

Multiple servers may be included in the remote system 1100, such as oneor more servers for emulating operation of an electronic circuit. Inoperation, each of these server (or groups of servers) may includecomputer-readable and computer-executable instructions that reside onthe respective server, as will be discussed further below. Each of thesedevices/systems 110/1100 may include one or more I/O device interfaces1002/1102 for enabling communication over the network 100. Each of thesedevices/systems 110/1100 may include one or more controllers/processors1004/1104, which may each include a central processing unit (CPU) forprocessing data and computer-readable instructions, and a memory1006/1106 for storing data and instructions of the respective device.The memories 1006/1106 may individually include volatile random accessmemory (RAM), non-volatile read only memory (ROM), non-volatilemagneto-resistive memory (MRAM), or other types of memory. Eachdevice/system 110/1100 may also include a data-storage component1008/1108 for storing data and controller/processor-executableinstructions. Each data-storage component 1008/1108 may individuallyinclude one or more non-volatile storage types such as magnetic storage,optical storage, solid-state storage, etc. Each device/system 110/1100may also be connected to removable or external non-volatile memory orstorage (such as a removable memory card, memory key drive, networkedstorage, etc.) through respective input/output device interfaces1002/1102. The user device 110 may further include an antenna 1012,microphone 1014, loudspeaker 1016, and/or display 1018.

Computer instructions for operating each device/system 110/1100 and itsvarious components may be executed by the respective device's/system'scontroller(s)/processor(s) 1004/1104, using the memory 1006/1106 astemporary “working” storage at runtime. The computer instructions may bestored in a non-transitory manner in non-volatile memory 1006/1106,storage 1008/1108, and/or an external device(s). Alternatively, some orall of the executable instructions may be embedded in hardware orfirmware on the respective device in addition to or instead of software.

Each device/system 110/1100 includes input/output device interfaces1002/1102. A variety of components may be connected through theinput/output device interfaces 1002/1102, as will be discussed furtherbelow. Additionally, each device/system 110/1100 may include anaddress/data bus 1010/1110 for conveying data among components of therespective device/system. Each component within a device/system 110/1110may also be directly connected to other components in addition to (orinstead of) being connected to other components across the bus1010/1110.

The device 110 may include input/output device interfaces 1002 thatconnect to a variety of components such as an audio output component, awired headset, or a wireless headset, or other component capable ofoutputting audio. The device 110 may also include an audio capturecomponent. The audio capture component may be, for example, themicrophone 1014 or array of microphones, a wired headset, or a wirelessheadset, etc.

Via antenna(s) 1012, the input/output device interfaces 1002 may connectto one or more networks 100 via a wireless local area network (WLAN)(such as WiFi) radio, Bluetooth, or wireless network radio, such as aradio capable of communication with a wireless communication networksuch as a Long Term Evolution (LTE) network, WiMAX network, 3G network,4G network, 5G network, etc. A wired connection such as Ethernet mayalso be supported. Through the network(s) 100, the system may bedistributed across a networked environment. The I/O device interface1002/1102 may also include communication components that allow data tobe exchanged between devices such as different physical systems in acollection of systems or other components.

Referring to FIG. 12 , as noted above, multiple devices 110 may beemployed in a single system. In such a multi-device system, each of thedevices may include different components for performing differentaspects of the system's processing. The multiple devices may includeoverlapping components. The components of the device 110 or the system1100, as described herein, are illustrative, and may be located as astand-alone device or may be included, in whole or in part, as acomponent of a larger device or system.

The network 100 may further connect user devices 110 such as a laptopcomputer 110 a, a desktop computer 110 b, a tablet computer 110 c,and/or a smart phone 110 d through a wireless service provider, over aWiFi or cellular network connection, or the like. Other devices may beincluded as network-connected support devices, such as a remote system1100. The support devices may connect to the network 100 through a wiredconnection or wireless connection. Networked devices 110 may captureaudio using one-or-more built-in or connected microphones 1014 oraudio-capture devices, with processing performed by components of thesame device 110 or another device/system 110/1100 connected via network100. The concepts disclosed herein may be applied within a number ofdifferent devices and computer systems.

The above aspects of the present disclosure are meant to beillustrative. They were chosen to explain the principles and applicationof the disclosure and are not intended to be exhaustive or to limit thedisclosure. Many modifications and variations of the disclosed aspectsmay be apparent to those of skill in the art. Persons having ordinaryskill in the field of computers will understand that components andprocess steps described herein may be interchangeable with othercomponents or steps, or combinations of components or steps, and stillachieve the benefits and advantages of the present disclosure. Moreover,it should be apparent to one skilled in the art, that the disclosure maybe practiced without some or all of the specific details and stepsdisclosed herein.

Aspects of the disclosed system may be implemented as a computer methodor as an article of manufacture such as a memory device ornon-transitory computer readable storage medium. The computer readablestorage medium may be readable by a computer and may compriseinstructions for causing a computer or other device to perform processesdescribed in the present disclosure. The computer readable storage mediamay be implemented by a volatile computer memory, non-volatile computermemory, hard drive, solid-state memory, flash drive, removable disk orother media. In addition, components of one or more of the componentsand engines may be implemented as in firmware or hardware, such as theacoustic front end, which comprise among other things, analog or digitalfilters (e.g., filters configured as firmware to a digital signalprocessor (DSP)).

Conditional language used herein, such as, among others, “can,” “could,”“might,” “may,” “e.g.,” and the like, unless specifically statedotherwise, or otherwise understood within the context as used, isgenerally intended to convey that certain embodiments include, whileother embodiments do not include, certain features, elements or steps.Thus, such conditional language is not generally intended to imply thatfeatures, elements, or steps are in any way required for one or moreembodiments or that one or more embodiments necessarily include logicfor deciding, with or without other input or prompting, whether thesefeatures, elements, or steps are included or are to be performed in anyparticular embodiment. The terms “comprising,” “including,” “having,”and the like are synonymous and are used inclusively, in an open-endedfashion, and do not exclude additional elements, features, acts,operations, and so forth. Also, the term “or” is used in its inclusivesense (and not in its exclusive sense) so that when used, for example,to connect a list of elements, the term “or” means one, some, or all ofthe elements in the list.

Disjunctive language such as the phrase “at least one of X, Y, Z,”unless specifically stated otherwise, is understood with the context asused in general to present that an item, term, etc., may be either X, Y,or Z, or any combination thereof (e.g., X, Y, or Z). Thus, suchdisjunctive language is not generally intended to, and should not, implythat certain embodiments require at least one of X, at least one of Y,or at least one of Z to each be present.

As used in this disclosure, the term “a” or “one” may include one ormore items unless specifically stated otherwise. Further, the phrase“based on” is intended to mean “based at least in part on” unlessspecifically stated otherwise.

What is claimed is:
 1. A computer-implemented method for simulatingoperation of an electronic circuit, the method comprising: receivingfirst data representing a hardware-description language (HDL)description of an electronic circuit; identifying, using the first data,a first description of a first storage element configured to process,during a first time period, an input signal to store data associatedwith the input signal and to output, during a second time period, anoutput signal representing the data; determining a computer-memorystructure configured to store the data; determining a second descriptionof a second storage element configured to store a reference valueindicating a location of the data in the computer-memory structure, atopology of the second storage element corresponding to a topology ofthe first storage element; determining, using the first data and thesecond description, a software-simulation model corresponding to the HDLdescription and the second storage element; storing, using a firstportion of the software-simulation model corresponding to the secondstorage element, the reference value; and processing, by thecomputer-memory structure during the second time period, the referencevalue to output the data.